Model Architectures: Part 1

Neon supports the ability to build more complex models than just a linear list of layers. In this series of notebooks, you will implement several models and understand how data should be passed when a model may have multiple inputs/outputs.

Tree Models

Neon supports models with a main trunk that includes branch points to leaf nodes. In this scenario, the models takes a single input but produces multiple outputs that can be matched against multiple targets. For example, consider the below topology:

cost1      cost3
  |          /
 m_l4      b2_l2
  |        /
  | ___b2_l1
  |/
 m_l3       cost2
  |          /
 m_l2      b1_l2
  |        /
  | ___b1_l1
  |/
  |
 m_l1
  |
  |
 data

Suppose we wanted to apply this model to the MNIST dataset. The MNIST data iterator returns, for each minibatch, a tuple of tensors (X, Y). Since there are multiple outputs, the single target labels Y are used to match against all these outputs. Alternatively, we could write a custom iterator that yields for each minibatch, a nested tuple (X, (Y1, Y2, Y3)). Then, each target label will mapped to its respective output layer.

We will guide you through implementing such a branching model. We first import all the needed ingredients:


In [ ]:
from neon.callbacks.callbacks import Callbacks
from neon.initializers import Gaussian
from neon.layers import GeneralizedCost, Affine, Multicost, SingleOutputTree
from neon.models import Model
from neon.optimizers import GradientDescentMomentum
from neon.transforms import Rectlin, Logistic, Softmax
from neon.transforms import CrossEntropyBinary, CrossEntropyMulti, Misclassification
from neon.backends import gen_backend

We also set up the backend and load the data.


In [ ]:
be = gen_backend(batch_size=128)

from neon.data import MNIST

mnist = MNIST(path='data/')
train_set = mnist.train_iter
valid_set = mnist.valid_iter

Now its your turn! Set up the branch nodes and layer structure above. Some tips:

  • Use Affine layers.
  • You can choose your hidden unit sizes, just make sure that the three final output layers have 10 units for the 10 categories in the MNIST dataset.
  • The three final output layers should also use Softmax activation functions to ensure that the probability sums to 1.

As a reminder, to define a single layer, we need a weight initialization and an activation function:


# define a layers
layer1 = Affine(nout=100, init=Gaussian(0.01), activation=Rectlin())

# alternative, you can take advantage of common parameters by constructing
# a dictionary:
normrelu = dict(init=init_norm, activation=Rectlin())

# pass the dictionary to the layers as keyword arguments using the ** syntax.
layer1 = Affine(nout=100, **normrelu)
layer2 = Affine(nout=10, **normrelu)

To set up a simple Tree:

# define a branch node
b1 = BranchNode(name="b1")

# define the main trunk
path1 = [layer1, b1, layer2]

# define the branch
path2 = [b1, layer3]

# build the model as a Tree
# alphas are the weights given to the branches of Tree during backpropagation.
model = Model(layers=SingleOutputTree([path1, path2], alphas = [1, 1]))

We have included below skeleton of the code for you to fill out to build the model above.


In [ ]:
from neon.layers import BranchNode
BranchNode.instances = dict()

# define common parameters as dictionary (see above)
init_norm = Gaussian(loc=0.0, scale=0.01)

normrelu = dict(init=init_norm, activation=Rectlin())
normsigm = dict(init=init_norm, activation=Logistic(shortcut=True))
normsoft = dict(init=init_norm, activation=Softmax())

# define your branch nodes
# branch nodes need to have a unique name
b1 = BranchNode(name="b1")
b2 = BranchNode(name="b2")

# define the main trunk (cost1 above)
..

# define the branch (cost2)
...

# define the branch (cost3)
...

# build the model as a SingleOutputTree
...

Now let's fit our model! First, set up multiple costs for each of the three branches using MultiCost:


In [ ]:
cost = Multicost(costs=[GeneralizedCost(costfunc=CrossEntropyMulti()),
                        GeneralizedCost(costfunc=CrossEntropyMulti()),
                        GeneralizedCost(costfunc=CrossEntropyMulti())])

To test that your model was constructed properly, we first initialize the model with a dataset (so that it configures the layer shapes appropriately) and a cost, then print the model.


In [ ]:
model.initialize(train_set, cost)
print model

Then, we set up the remaining components and run fit!


In [ ]:
# setup optimizer
optimizer = GradientDescentMomentum(0.1, momentum_coef=0.9)

# setup standard fit callbacks
callbacks = Callbacks(model, eval_set=valid_set, eval_freq=1)
model.fit(train_set, optimizer=optimizer, num_epochs=20, cost=cost, callbacks=callbacks)

Try to adjust the model architecture and hyperparameters to reach a CrossEntropyLoss of 0.16 or below. When re-running the model, we suggest restarting the ipython kernel: [Kernel -> Restart & Run All]